1 Mount Hood Environmental, PO Box 1303, Challis, Idaho, 83226, USA
2 Mount Hood Environmental, 39085 Pioneer Boulevard #100 Mezzanine, Sandy, Oregon, 97055, USA
3 Mount Hood Environmental, PO Box 4282, McCall, Idaho, 83638, USA
✉ Correspondence: Bryce N. Oldemeyer <bryce.oldemeyer@mounthoodenvironmental.com>, Mark Roes <mark.roes@mthoodenvironmental.com>
Quantile random forest (QRF) models have become popular for quantifying freshwater habitat carrying capacity due to their flexible framework that avoid common pitfalls associated with noisy data, correlated variables, and non-linear relationships. Recently, three QRF models were fit using fish-habitat data from the Columbia Habitat Monitoring Program (CHaMP) and used to estimate habitat carrying capacity of wadable streams for ESA-listed populations of Chinook salmon and steelhead during three critical life-stages (juvenile summer parr, juvenile winter presmolt, and adult redds) within the Columbia River Basin. The covariates included in those models were selected from >100 potential covariates and chosen for their high predictive power (Appendix B of Idaho OSC Team, 2019; See et al. 2021). While the covariates in the model likely provide high predictive power, many of the covariates are not informative for restoration project monitoring, restoration project design, and can be difficult to measure or replicate using streamlined fish habitat protocols (DASH - Carmichael et al. 2019). To increase the utility of the QRF model for restoration efforts and future data collection, we explored alternative covariate options for the QRF models that; 1) maintained high predictive power, 2) could be calculated from DASH surveys, 3) were informative for restoration project development and monitoring, 4) were not missing an overabundance of data in the CHaMP dataset, and 5) were not highly correlated with other covariates in the models (avoid overfitting the models). Additionally, we wanted to test the assumption made during the development of the original QRF model that a single model was appropriate for both Chinook salmon and steelhead during each of the three life stages.
Similarly, a random forest (RF) model was used to predict habitat capacity across larger spatial scales where CHaMP and/or DASH data weren’t available (cite IRA). We revisited the globally available attributes (GAAs) included in the original RF extrapolation model and made minor modifications to the model that maintained covariates with high predictive power and included covariates that better aligned with the revised QRF model covariates. To compare relative performance between the original and modified QRF/RF models, we evaluated watershed carrying capacity estimates produced by the both sets of models for eight watersheds located within the Upper Salmon River basin.
Through this process, we successfully developed modified QRF/RF models that were more informative for restoration design and monitoring, included covariates that could be calculated using newly developed stream habitat protocols, and maintained a similar level of predictive power as the original QRF/RF models. Below is a description of the process.
Habitat covariates for the QRF habitat capacity models were generated from the CHaMP dataset or obtained from other publicly available sources (e.g. NorWest stream temperature data). In total, 129 covariates were included in the selection process. Covariates were aggregated into eleven metric categories and 1-4 covariates were chosen from each category based on the criteria below.
What was the strength between the covariate and the response variable (based on MIC score)?
Could the covariate be calculated using DASH data?
Was the covariate informative for restoration efforts?
How much data were missing and/or the amount of “0”s for the covariate in the fish-habitat dataset?
How correlated was the covariate with other covariates within the same metric category, particularly covariates with higher MIC scores?
Below is a simplified, theoretical example of how a covariate might be selected for a model.
–
In the original QRF model, discharge was included as a covariate because it had a high MIC score and it made biological sense (i.e. discharge is a significant factor impacting fish habitat use and, presumably, habitat carrying capacity). Unfortunately, discharge isn’t that informative for restoration efforts because most restoration actions can’t create water. Discharge, like many habitat covariates, is highly correlated to other habitat covariates which may have been left out of the original QRF model for any number of reasons (highly correlated with other covariates already in the model, excluded to avoid overfitting the model, etc.). Using the rubric, it is observed that average thalweg depth has a MIC score that is nearly as high as discharge, it is informative for restoration efforts, it can be calculated with DASH, and it is highly highly correlated with discharge. Based on all the information above, mean thalweg depth would be substituted for discharge in the model.
–
Last, the covariate selection process was done independently for both species for all three life stages to test the assumption made during the original QRF model development that it was appropriate to apply the same life stage models to both species.
There were 12-14 covariates selected for each of the six QRF habitat capacity models. While the relative importance of the final covariates in the three life stage models differed between species, the final covariates themselves were nearly identical. (Figure 2.1 , Figure 2.2, and Figure 2.3 ). This confirmed the original assumption that is was appropriate to use one model for each life stage for both species. Because of this, we consolidated the species-specific models into a single winter juvenile, summer juvenile, and redd model to be used for both species. (Table 2.1). Additionally, partial dependence plots for the modified QRF habitat capacity models can be found in Section @ref(modified-qrf-habitat-capacity-model—partial-dependence-plots)
Figure 2.1: Relative importance plots for covariates included in the modified juvenile summer QRF models
Figure 2.2: Relative importance plots for covariates included in the modified juvenile winter QRF models
Figure 2.3: Relative importance plots for covariates included in the modified QRF redds models
| Name | Metric Category | Juv Sum Chnk | Juv Sum Sthd | Juv Win Chnk | Juv Win Sthd | Redds Chnk | Redds Sthd | Description |
|---|---|---|---|---|---|---|---|---|
| Channel Unit Frequency | ChannelUnit | 5 | 9 | 5 | 3 | 1 | 1 | Number of channel units per 100 meters. |
| Fast NonTurbulent Frequency | ChannelUnit | 6 | 13 | – | – | 13 | 4 | Number of Fast Water Non-Turbulent channel units per 100 meters. |
| Sinuosity | Complexity | 13 | 7 | 10 | 10 | 10 | 12 | Ratio of the thalweg length to the straight line distance between the start and end points of the thalweg. |
| Wetted Channel Braidedness | Complexity | 14 | 14 | 13 | 13 | – | – | Ratio of the total length of the wetted mainstem channel plus side channels and the length of the mainstem channel. |
| Fish Cover: Some Cover | Cover | 8 | 4 | 8 | 8 | 9 | 3 | Percent of wetted area with some form of fish cover |
| Large Wood Density | Cover | – | – | 4 | 5 | – | – | Large Wood per sq meter |
| Residual Depth | Size | – | – | 2 | 2 | – | – | Average residual depth of the channel unit. |
| Average Thalweg Depth | Size | 1 | 3 | – | – | 2 | 2 | Average Thalweg Depth, meters |
| Thalweg Exit Depth Avg | Size | – | – | 6 | 7 | – | – | Depth of the thalweg at the downstream edge of the channel unit. |
| Gradient | Size | 3 | 2 | 7 | 1 | 4 | 6 | Site water surface gradient is calculated as the difference between the top of site (upstream) and bottom of site (downstream) water surface elevations divided by thalweg length. |
| Residual Pool Depth | Size | 12 | 10 | – | – | 11 | 5 | The average difference between the maximum depth and downstream end depth of all Slow Water/Pool channel units. |
| Discharge | Size | – | – | 3 | 4 | – | – | The sum of station discharge across all stations. Station discharge is calculated as depth x velocity x station increment for all stations except first and last. Station discharge for first and last station is 0.5 x station width x depth x velocity. |
| Substrate Est: Boulders | Substrate | 10 | 12 | – | – | 8 | 11 | Percent of boulders (256-4000 mm) within the wetted site area. |
| Substrate Est: Cobble and Boulder | Substrate | – | – | 11 | 11 | – | – | Total cobble plus boulder percentage |
| Substrate Est: Cobbles | Substrate | 11 | 6 | – | – | 5 | 8 | Percent of cobbles (64-256 mm) within the wetted site area. |
| Substrate Est: Coarse and Fine Gravel | Substrate | 7 | 8 | 12 | 12 | 7 | 13 | Percent of coarse and fine gravel (2-64 mm) within the wetted site area. |
| Substrate Est: Sand and Fines | Substrate | 9 | 5 | 9 | 9 | 6 | 7 | Percent of sand and fine sediment (0.01-2 mm) within the wetted site area. |
| Avg. August Temperature | Temperature | 2 | 1 | – | – | 3 | 10 | Average predicted daily August temperature from NorWest, averaged across the years 2002-2011. |
| Elevation | Temperature | – | – | 1 | 6 | – | – | Elevation, meters |
| Large Wood Frequency: Wetted | Wood | 4 | 11 | – | – | 12 | 9 | Number of large wood pieces per 100 meters within the wetted channel. |
The spatial extent of QRF capacity predictions is limited to reaches with high-resolution habitat data (i.e. CHaMP or DASH data). To estimate capacity outside of the QRF habitat capacity spatial extent, an extrapolation model fit to “globally available attributes” (GAAs) obtained from a continuous, linear stream network created by Morgan Bond and Tyler Nodine (cite - based on the National Hydrography Dataset High Resolution 1:24,000) was used for the entire Columbia River Basin. A random forest model was fit using the GAAs from the linear stream network and used to estimate habitat capacity for the entire Columbia River Basin at a 200 meter reach scale. Consistent with the QRF habitat capacity models, the RF extrapolation model makes no assumptions about the direction and distribution of effects of predictors, and constrains capacity estimates within the range of predictions produced by the QRF habitat capacity model. However, random forest methods do not account for variable strata weights across the CHaMP dataset, a source of potential bias that could be alleviated through the collection of additional paired fish and habitat data.
RF extrapolation model covariates were selected from the list of GAAs and examined for inclusion by examining relative importance plots (Figure 3.1, Figure 3.2, and Figure 3.3 ), partial dependence plots (Section @ref(modified-rf-extrapolation-model—partial-dependence-plots)) , and correlations between covariates. We used the covariates included in the previous extrapolation as a starting point for selection. This resulted in the replacement of regime (an indicator of dominant precipitation type) for elevation and the removal of relative slope, which we found was redundant with gradient. Model results indicated that elevation was consistently one of the most important predictors in the model. This is particularly true for the Chinook parr summer model where capacity predictions were primarily driven by elevation.
Figure 3.1: Relative importance plots for covariates included in the modified juvenile summer RF extrapolation models
Figure 3.2: Relative importance plots for covariates included in the modified juvenile winter RF extrapolation models
Figure 3.3: Relative importance plots for covariates included in the modified juvenile winter RF extrapolation models
| Metric | Decription |
|---|---|
| Gradient % | Stream gradient (%). |
| Sinuosity | Reach sinuosity. 1 = straight, 1 < sinuous. |
| Alpine accumulation | Number of upstream cells in alpine terrain. |
| Fines accumulation | Number of upstream cells in fine grain lithologies. |
| Flow accumulation | Number of upstream DEM cells flowing into reach. |
| Gravel accumulation | Number of upstream cells in gravel producing lithologies. |
| Precipitation accumulation | Number of upstream cells weighted by average annual precipitation. |
| Floodplain width | Current unmodified floodplain width. |
| Avg Aug stream temperature | Historical composite scenario representing 10 year average August mean stream temperatures for 2002-2011 (Isaak et al. 2017). |
| Disturbance PCA 1 | Disturbance Classification PCA 1 Score (Whittier et al. 2011). |
| Natural PCA 1 | Natural Classification PCA 1 Score (Whittier et al. 2011). |
| Natural PCA 2 | Natural Classification PCA 2 Score (Whittier et al. 2011). |
| Elevation | Elevation at downstream end of reach |
Habitat carrying capacity was estimated with the modified QRF/RF models for Chinook salmon and steelhead for juvenile summer, juvenile winter, and redd life stages for eight watersheds in the Upper Salmon River Basin. Spatial domains for species were originally defined by Streamnet (https://www.streamnet.org/home/data-maps/gis-data-sets/) and modified based on expert knowledge from regional biologists.
Figure 3.4: Extrapolations of habitat capacity for Chinook salmon, by life-stage, for the eight watersheds within the Upper Salmon River Basin using the modified models.
| Watershed | Juv summer capacity | Summer SE | Juv winter capacity | Winter SE | Redd capacity | Redd SE |
|---|---|---|---|---|---|---|
| EF Salmon | 1,926,623 | 226,925.7 | 138,214 | 32,880.3 | 402 | 20.7 |
| Lemhi | 786,452 | 62,659.8 | 141,515 | 15,358.7 | 353 | 11.0 |
| NF Salmon | 339,275 | 50,147.9 | 70,462 | 10,409.3 | 166 | 7.8 |
| Pahsimeroi | 265,099 | 18,409.2 | 86,999 | 9,780.7 | 139 | 4.4 |
| Panther Cr | 1,219,542 | 118,369.5 | 201,265 | 22,296.2 | 448 | 16.5 |
| Upper Salmon | 3,301,286 | 352,419.5 | 166,522 | 45,582.4 | 575 | 28.6 |
| Valley Cr | 1,902,198 | 207,362.9 | 115,517 | 32,535.0 | 394 | 19.7 |
| Yankee Fork | 2,144,056 | 274,555.8 | 119,298 | 28,782.6 | 438 | 23.0 |
| Watershed | Juv summer capacity/km | Summer SE/km | Juv winter capacity/km | Winter SE/km | Redd capacity/km | Redd SE/km |
|---|---|---|---|---|---|---|
| EF Salmon | 12,335 | 1,452.9 | 885 | 210.5 | 3 | 0.1 |
| Lemhi | 5,766 | 459.4 | 1,038 | 112.6 | 3 | 0.1 |
| NF Salmon | 6,504 | 961.3 | 1,351 | 199.5 | 3 | 0.1 |
| Pahsimeroi | 5,146 | 357.3 | 1,689 | 189.8 | 3 | 0.1 |
| Panther Cr | 8,544 | 829.3 | 1,410 | 156.2 | 3 | 0.1 |
| Upper Salmon | 17,082 | 1,823.5 | 862 | 235.9 | 3 | 0.1 |
| Valley Cr | 15,833 | 1,726.0 | 961 | 270.8 | 3 | 0.2 |
| Yankee Fork | 14,967 | 1,916.6 | 833 | 200.9 | 3 | 0.2 |
Figure 3.5: Extrapolations of habitat capacity for steelhead, by life-stage, for the eight watersheds within the Upper Salmon River Basin using the modified models.
| Watershed | Juv summer capacity | Summer SE | Juv winter capacity | Winter SE | Redd capacity | Redd SE |
|---|---|---|---|---|---|---|
| EF Salmon | 252,597 | 15,520.5 | 337,682 | 36,795 | 413 | 24 |
| Lemhi | 310,577 | 9,082.3 | 363,898 | 27,441 | 441 | 18 |
| NF Salmon | 242,471 | 18,381.8 | 313,118 | 27,955 | 323 | 22 |
| Pahsimeroi | 159,705 | 6,225.1 | 205,921 | 13,951 | 198 | 8 |
| Panther Cr | 268,476 | 13,598.0 | 339,671 | 19,946 | 317 | 15 |
| Upper Salmon | 243,548 | 14,843.6 | 310,879 | 39,013 | 452 | 32 |
| Valley Cr | 176,048 | 10,707.6 | 288,579 | 31,329 | 365 | 26 |
| Yankee Fork | 197,926 | 12,378.9 | 341,310 | 38,555 | 449 | 36 |
| Watershed | Juv summer capacity/km | Summer SE/km | Juv winter capacity/km | Winter SE/km | Redd capacity/km | Redd SE/km |
|---|---|---|---|---|---|---|
| EF Salmon | 1,525 | 93.7 | 2,039 | 222.2 | 2 | 0.1 |
| Lemhi | 1,774 | 51.9 | 2,079 | 156.8 | 3 | 0.1 |
| NF Salmon | 2,049 | 155.3 | 2,646 | 236.2 | 3 | 0.2 |
| Pahsimeroi | 1,924 | 75.0 | 2,481 | 168.1 | 2 | 0.1 |
| Panther Cr | 2,105 | 106.6 | 2,664 | 156.4 | 2 | 0.1 |
| Upper Salmon | 1,485 | 90.5 | 1,895 | 237.8 | 3 | 0.2 |
| Valley Cr | 1,465 | 89.1 | 2,401 | 260.7 | 3 | 0.2 |
| Yankee Fork | 1,249 | 78.1 | 2,154 | 243.4 | 3 | 0.2 |
Comparisons of watershed capacity estimates from the original QRF/RF models and the modified QRF/RF models reveal modest differences in most cases, with an exception of Chinook parr summer capacities in several watersheds. The substantial increases in Chinook parr summer capacity are likely due to the inclusion of elevation in the RF extrapolation model and range from 21 - 222% compared to the previous extrapolation.
Figure 3.6: Comparison of Chinook salmon habitat capacity estimates between modified and original model extrapolation, by life-stage, for the eight watersheds within the Upper Salmon River Basin.
| Model | Watershed | Capacity per km | Total capacity | Capacity % change | Capacity SE |
|---|---|---|---|---|---|
| Juv summer | EF Salmon | 12,335.5 | 1,926,623 | 112 | 226,926 |
| Juv summer | Lemhi | 5,765.9 | 786,452 | 112 | 62,660 |
| Juv summer | NF Salmon | 6,503.6 | 339,275 | 13 | 50,148 |
| Juv summer | Pahsimeroi | 5,145.6 | 265,099 | 45 | 18,409 |
| Juv summer | Panther Cr | 8,543.7 | 1,219,542 | 21 | 118,369 |
| Juv summer | Upper Salmon | 17,081.6 | 3,301,286 | 163 | 352,419 |
| Juv summer | Valley Cr | 15,832.8 | 1,902,198 | 152 | 207,363 |
| Juv summer | Yankee Fork | 14,967.3 | 2,144,056 | 222 | 274,556 |
| Juv winter | EF Salmon | 884.9 | 138,214 | 0 | 32,880 |
| Juv winter | Lemhi | 1,037.5 | 141,515 | -8 | 15,359 |
| Juv winter | NF Salmon | 1,350.7 | 70,462 | 28 | 10,409 |
| Juv winter | Pahsimeroi | 1,688.7 | 86,999 | -8 | 9,781 |
| Juv winter | Panther Cr | 1,410.0 | 201,265 | 29 | 22,296 |
| Juv winter | Upper Salmon | 861.6 | 166,522 | -29 | 45,582 |
| Juv winter | Valley Cr | 961.5 | 115,517 | -12 | 32,535 |
| Juv winter | Yankee Fork | 832.8 | 119,298 | 20 | 28,783 |
| Redds | EF Salmon | 2.6 | 402 | -13 | 21 |
| Redds | Lemhi | 2.6 | 353 | 5 | 11 |
| Redds | NF Salmon | 3.2 | 166 | -5 | 8 |
| Redds | Pahsimeroi | 2.7 | 139 | 25 | 4 |
| Redds | Panther Cr | 3.1 | 448 | -4 | 17 |
| Redds | Upper Salmon | 3.0 | 575 | -20 | 29 |
| Redds | Valley Cr | 3.3 | 394 | -29 | 20 |
| Redds | Yankee Fork | 3.1 | 438 | -38 | 23 |
Figure 3.7: Comparison of steelhead habitat capacity estimates between modified and original models extrapolation, by life-stage, for the eight watersheds within the Upper Salmon River Basin.
| Model | Watershed | Capacity per km | Total capacity | Capacity % change | Capacity SE |
|---|---|---|---|---|---|
| Juv summer | EF Salmon | 1,525.4 | 252,597 | -31 | 15,521 |
| Juv summer | Lemhi | 1,774.2 | 310,577 | -15 | 9,082 |
| Juv summer | NF Salmon | 2,048.7 | 242,471 | -5 | 18,382 |
| Juv summer | Pahsimeroi | 1,924.2 | 159,705 | -18 | 6,225 |
| Juv summer | Panther Cr | 2,105.3 | 268,476 | -8 | 13,598 |
| Juv summer | Upper Salmon | 1,484.6 | 243,548 | -31 | 14,844 |
| Juv summer | Valley Cr | 1,465.0 | 176,048 | -28 | 10,708 |
| Juv summer | Yankee Fork | 1,249.4 | 197,926 | -29 | 12,379 |
| Juv winter | EF Salmon | 2,039.2 | 337,682 | -14 | 36,795 |
| Juv winter | Lemhi | 2,078.7 | 363,898 | -8 | 27,441 |
| Juv winter | NF Salmon | 2,645.6 | 313,118 | -1 | 27,955 |
| Juv winter | Pahsimeroi | 2,481.0 | 205,921 | -4 | 13,951 |
| Juv winter | Panther Cr | 2,663.6 | 339,671 | 8 | 19,946 |
| Juv winter | Upper Salmon | 1,895.1 | 310,879 | -26 | 39,013 |
| Juv winter | Valley Cr | 2,401.4 | 288,579 | -14 | 31,329 |
| Juv winter | Yankee Fork | 2,154.4 | 341,310 | -18 | 38,555 |
| Redds | EF Salmon | 2.5 | 413 | -13 | 24 |
| Redds | Lemhi | 2.5 | 441 | 10 | 18 |
| Redds | NF Salmon | 2.7 | 323 | -10 | 22 |
| Redds | Pahsimeroi | 2.4 | 198 | 2 | 8 |
| Redds | Panther Cr | 2.5 | 317 | -7 | 15 |
| Redds | Upper Salmon | 2.8 | 452 | -11 | 32 |
| Redds | Valley Cr | 3.0 | 365 | -20 | 26 |
| Redds | Yankee Fork | 2.8 | 449 | -25 | 36 |
Write out a brief explanation on how to interpret the pdp plots, caveats, etc.
Figure 4.1: Partial dependence plots for covariates included in the modified juvenile summer QRF models
Figure 4.2: Partial dependence plots for covariates included in the modified juvenile summer QRF models
Figure 4.3: Partial dependence plots for covariates included in the modified juvenile winter QRF models
Figure 4.4: Partial dependence plots for covariates included in the modified juvenile winter QRF models
Figure 4.5: Partial dependence plots for covariates included in the modified QRF redds models
Figure 4.6: Partial dependence plots for covariates included in the modified QRF redds models
Write out a brief explanation on how to interpret the pdp plots, caveats, etc.
Figure 4.7: Partial dependence plots for covariates included in the modified juvenile summer RF extrapolation models
Figure 4.8: Partial dependence plots for covariates included in the modified juvenile summer RF extrapolation models
Figure 4.9: Partial dependence plots for covariates included in the modified juvenile summer RF extrapolation models
Figure 4.10: Partial dependence plots for covariates included in the modified juvenile summer RF extrapolation models
Figure 4.11: Partial dependence plots for covariates included in the modified juvenile winter RF extrapolation models
Figure 4.12: Partial dependence plots for covariates included in the modified juvenile winter RF extrapolation models
Figure 4.13: Partial dependence plots for covariates included in the modified juvenile winter RF extrapolation models
Figure 4.14: Partial dependence plots for covariates included in the modified juvenile winter RF extrapolation models
Figure 4.15: Partial dependence plots for covariates included in the modified redds RF extrapolation models
Figure 4.16: Partial dependence plots for covariates included in the modified redds RF extrapolation models
Figure 4.17: Partial dependence plots for covariates included in the modified redds RF extrapolation models
Figure 4.18: Partial dependence plots for covariates included in the modified redds RF extrapolation models